Add read-only Config endpoint #9497

22quinn · 2020-06-24T11:58:52Z

Make sure to mark the boxes below before creating PR: [x]

Description above provides context of the change
Unit tests coverage for changes (not needed for documentation changes)
Target Github ISSUE in description if exists
Commits follow "How to write a good git commit message"
Relevant documentation is updated including usage instructions.
I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

22quinn · 2020-06-24T12:05:24Z

I see there are pagenization parameters:

airflow/airflow/api_connexion/openapi/v1.yaml

Lines 1136 to 1138 in d531cd6

    
           parameters: 
        
             - $ref: '#/components/parameters/PageLimit' 
        
             - $ref: '#/components/parameters/PageOffset'

But to me it seems werid to have pagenization for config. I prefer to remove it. WDYT? @mik-laj

mik-laj · 2020-06-24T12:08:21Z

@zikun I agree. That's weird. Let's delete these parameters.

airflow/api_connexion/endpoints/config_endpoint.py

mik-laj · 2020-06-24T18:39:32Z

airflow/api_connexion/openapi/v1.yaml

@@ -1760,6 +1757,9 @@ components:
        value:
          type: string
          readOnly: true
+        source:


I think they can't do anything about this information. They will not change their behavior because the information comes from environment variable.

This is inspired by the table in the web configuration page, which has four columns - section, key, value and source. Isn't source information useful for admin users to change and debug the configuration? Especially when it comes from multiple sources like airflow.cfg, env var, cmd.

I am not sure if we will be able to maintain the backward compatibility of the API for this field. in my opinion, the value of this field in the API is low because it refers to values that the API client cannot influence in any way. This may allow debugging problems, but the main goal of the API is to facilitate the management, but not to facilitate troubleshooting.

A similar situation is with the Job table, which is not present in API, and access to it allows us to solve troubleshooting issues, but this table is not relevant for third-party systems and has not been included in the API specification. Each field/endpoint in the API is opt-in, not opt-out, to facilitate backward compatibility.

If you want to make field decisions, think about whether this field will be relevant when you have 100 Airflow instances., In this case, you need a different view of the data stored in the system. You may worry about what the value of the configuration option looks like, e.g. to compare instances, but the source of the content is technical detail.

We can add additional endpoints that allow access to more detailed data in the future, but these endpoints will have to be specially marked to ensure level of stability.

I see where you are coming from. I think I am not clear on the main use case of this endpoint. Do you mind giving a specific example on what this endpoint might be used for? Like what do people do after they query GET /config from 100 Airflow instances?

Airflow has options that have a big impact on instance performance and resource usage.

parallelism = 32 dag_concurrency = 16 max_active_runs_per_dag = 16 dag_file_processor_timeout = 50 scheduler_heartbeat_sec = 5 job_heartbeat_sec = 5 processor_poll_interval = 1 min_file_process_interval = 0 dag_dir_list_interval = 300 etc.

Users may want to read these values and then combine them with data from other applications (e.g. Stackdriver, Zabbix, Prometheus) e..g. average CPU usage, average memory usage, etc. This will allow us to make recommendations on the changes that should be made to improve the health of the instance

Thanks. I removed it

22quinn · 2020-06-25T05:34:39Z

Hi @mik-laj I need some help for the unit test. Because the original airflow.configuration.conf variable contains many sections and options, the expected API response is too long to put in the test. It is also not maintainable as the default config can change. So I want to use a small conf to mock it. I tried both pytest monkeypatch and mock.patch, but the API still returns the original config. Any idea?

mik-laj · 2020-06-25T05:36:33Z

@zikun I'm starting to look at it

mik-laj · 2020-06-25T06:28:47Z

I think we need to give up one response format.
spec-first/connexion#860

mik-laj · 2020-06-25T06:40:57Z

@zikun Here is an example of testing using mock
mik-laj@48e4127

mik-laj · 2020-06-25T06:42:23Z

@i think we need to give up one response format.
spec-first/connexion#860

This is weird because Dag Source and Log uses different types of responses and it probably works there.

22quinn · 2020-06-25T10:19:39Z

@zikun Here is an example of testing using mock
mik-laj@48e4127

Thanks a lot for the example. It did not work because I was mocking conf variable. Mocking as_dict function works!

@i think we need to give up one response format.
zalando/connexion#860

This is weird because Dag Source and Log uses different types of responses and it probably works there.

I just tried testing with both json and text/plain response types. The json test failed. I'm looking into the dag source and log PRs now to find differences that lead to the failure.

22quinn · 2020-06-25T11:08:49Z

I fixed the json test. Now it works for both text/plain and json.

Now there's only one pylint test failing

tests/api_connexion/endpoints/test_config_endpoint.py:45:8: W0201: Attribute 'client' defined outside init (attribute-defined-outside-init)

I converted unittest to pytest as I remember there was a discussion to move away from unittest to pytest.
Can I make client a class attribute by moving it to setup_class()? Was there any reason to put it in setUp() rather than setUpClass() for unittest?

22quinn · 2020-06-25T13:22:47Z

All checks passed @mik-laj

mik-laj · 2020-06-25T13:31:30Z

I finished work today. Please ping me tomorrow.

turbaszek · 2020-06-25T14:31:50Z

airflow/api_connexion/endpoints/config_endpoint.py

+        config_text = '\n'.join(
+            f'[{config_section.name}]\n' +
+            ''.join(f'{config_option.key} = {config_option.value}  # source: {config_option.source}\n'
+                    for config_option in config_section.options)
+            for config_section in config.sections
+        )


What would you say to use some helper methods like:

def _make_single_record(config_option): return f'{config_option.key} = {config_option.value} # source: {config_option.source}\n' def _make_single_section(config_section): return f'[{config_section.name}]\n{_make_single_record(o) for o in config_section.options}' def _config_to_plain_text(config): return '\n'.join(_make_single_section(s) for s in config.sections)

I do not see the benefits of such gradation.

I think we can split it in a different way.

text_serializer = { ''text/plain'': func1 , ''text/plain'': func2 , } conf_dict = conf.as_dict() config = conf_dict_to_config(conf_dict) return_type = request.accept_mimetypes.best_match(response_types) if return_type not in serializer: return Response(status=406) config_text = text_serializer[return_type] return Response(config_text, headers={'Content-Type': return_type})

Thanks. I think both of your suggestions are good. We can combine them.

It's good to break into smaller functions especially as they handle different scopes, just like having nested classes for ConfigSchema. One benefit I can think of is in case we want to offer smaller endpoints like /config/{section}/{option}, we can easily make use of those small functions.

Co-authored-by: Tomek Urbaszek <turbaszek@apache.org> Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>

Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com> Co-authored-by: Tomek Urbaszek <turbaszek@apache.org> Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>

22quinn added 4 commits June 24, 2020 19:20

Add config schema

bc5f5e1

Add source to ConfigOptionSchema

61d72c2

Add config endpoint

2599168

Support plain text config response

d531cd6

boring-cyborg bot added the area:API Airflow's REST/HTTP API label Jun 24, 2020

22quinn marked this pull request as draft June 24, 2020 11:59

mik-laj reviewed Jun 24, 2020

View reviewed changes

airflow/api_connexion/endpoints/config_endpoint.py Outdated Show resolved Hide resolved

22quinn added 4 commits June 24, 2020 20:13

Remove config pagenization

1a93b85

Use NamedTuple

987746a

Generate config text from NamedTuple

dbfb341

Add test for config schema

a10c246

mik-laj reviewed Jun 24, 2020

View reviewed changes

Add tests using mock

48e4127

Use pytest

de3c2b6

Fix json test

344943b

22quinn force-pushed the api-config-endpoint branch from c18722d to 344943b Compare June 25, 2020 10:38

22quinn changed the title ~~[WIP] Add read-only Config endpoint~~ Add read-only Config endpoint Jun 25, 2020

22quinn marked this pull request as ready for review June 25, 2020 11:00

Fix pylint attribute-defined-outside-init

8446526

turbaszek reviewed Jun 25, 2020

View reviewed changes

Refactor config functions

72c6ef4

Co-authored-by: Tomek Urbaszek <turbaszek@apache.org> Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>

22quinn force-pushed the api-config-endpoint branch from e2e0f35 to 045fbbb Compare June 26, 2020 03:56

Remove source from config option

b9dcc7d

22quinn force-pushed the api-config-endpoint branch from 045fbbb to b9dcc7d Compare June 26, 2020 04:00

22quinn requested a review from mik-laj June 26, 2020 11:42

mik-laj approved these changes Jun 26, 2020

View reviewed changes

mik-laj merged commit f729cfd into apache:master Jun 26, 2020

22quinn deleted the api-config-endpoint branch June 27, 2020 02:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add read-only Config endpoint #9497

Add read-only Config endpoint #9497

22quinn commented Jun 24, 2020 •

edited

Loading

22quinn commented Jun 24, 2020

mik-laj commented Jun 24, 2020

mik-laj Jun 24, 2020

22quinn Jun 25, 2020

mik-laj Jun 25, 2020 •

edited

Loading

22quinn Jun 25, 2020 •

edited

Loading

mik-laj Jun 25, 2020

22quinn Jun 26, 2020

22quinn commented Jun 25, 2020

mik-laj commented Jun 25, 2020

mik-laj commented Jun 25, 2020

mik-laj commented Jun 25, 2020

mik-laj commented Jun 25, 2020

22quinn commented Jun 25, 2020

22quinn commented Jun 25, 2020 •

edited

Loading

22quinn commented Jun 25, 2020

mik-laj commented Jun 25, 2020

turbaszek Jun 25, 2020

mik-laj Jun 25, 2020

22quinn Jun 25, 2020

Add read-only Config endpoint #9497

Add read-only Config endpoint #9497

Conversation

22quinn commented Jun 24, 2020 • edited Loading

22quinn commented Jun 24, 2020

mik-laj commented Jun 24, 2020

mik-laj Jun 24, 2020

Choose a reason for hiding this comment

22quinn Jun 25, 2020

Choose a reason for hiding this comment

mik-laj Jun 25, 2020 • edited Loading

Choose a reason for hiding this comment

22quinn Jun 25, 2020 • edited Loading

Choose a reason for hiding this comment

mik-laj Jun 25, 2020

Choose a reason for hiding this comment

22quinn Jun 26, 2020

Choose a reason for hiding this comment

22quinn commented Jun 25, 2020

mik-laj commented Jun 25, 2020

mik-laj commented Jun 25, 2020

mik-laj commented Jun 25, 2020

mik-laj commented Jun 25, 2020

22quinn commented Jun 25, 2020

22quinn commented Jun 25, 2020 • edited Loading

22quinn commented Jun 25, 2020

mik-laj commented Jun 25, 2020

turbaszek Jun 25, 2020

Choose a reason for hiding this comment

mik-laj Jun 25, 2020

Choose a reason for hiding this comment

22quinn Jun 25, 2020

Choose a reason for hiding this comment

22quinn commented Jun 24, 2020 •

edited

Loading

mik-laj Jun 25, 2020 •

edited

Loading

22quinn Jun 25, 2020 •

edited

Loading

22quinn commented Jun 25, 2020 •

edited

Loading